Towards Completely Lifted Search-based Probabilistic Inference 



David Poole, Fahiem Bacchus, and Jacek Kisynski 

http : //wv . cs . ubc . ca/~poole/ 

http : //www . cs . toronto . edu/~f bacchus/ 

http : //www . cs . ubc . ca/~kisynski/ 



o 

(N 



(N 



< 
o 



(N 
> 

in 

m 
o 
^. 

l> 

o 



X 



Abstract 

The promise of lifted probabilistic inference is 
to carry out probabilistic inference in a rela- 
tional probabilistic model without needing to rea- 
son about each individual separately (grounding 
out the representation) by treating the undistin- 
guished individuals as a block. Current exact 
methods still need to ground out in some cases, 
typically because the representation of the inter- 
mediate results is not closed under the lifted op- 
erations. We set out to answer the question as to 
whether there is some fundamental reason why 
lifted algorithms would need to ground out un- 
differentiated individuals. We have two main re- 
sults: (1) We completely characterize the cases 
where grounding is polynomial in a population 
size, and show how we can do lifted inference in 
time polynomial in the logarithm of the popula- 
tion size for these cases. (2) For the case of no- 
argument and single-argument parametrized ran- 
dom variables where the grounding is not poly- 
nomial in a population size, we present lifted 
inference which is polynomial in the popula- 
tion size whereas grounding is exponential. Nei- 
ther of these cases requires reasoning separately 
about the individuals that are not explicitly men- 
tioned. 



1 Introduction 

The problem of lifted probabilistic inference in its general 
form was first explicitly proposed by Poole [2003], who 
formulated the problem in terms of parametrized random 
variables, introduced the use of splitting to complement 
unification, the parfactor representation of intermediate re- 
sults, and an algorithm for multiplying factors in a lifted 
manner, de Salvo Braz et al. [2005; 2007] invented count- 
ing elimination for some cases where grounding would cre- 
ate a factor with size exponential in the number of individu- 



als, but lifted inference can be done by counting the number 
of individuals with each assignment of values. Milch et al. 
[2008] proposed counting formulae as a representation of 
the intermediate result of counting, which allowed for more 
cases where counting was applicable. However, this body 
of research has not fulfilled the promise of lifted inference, 
as the algorithms still need to ground in some cases. The 
main problem is that the proposals are based on variable 
elimination [Zhang and Poole, 1994]. This is a dynamic 
programming approach which requires a representation of 
the intermediate results, and the current representations for 
such results are not closed under all of the operations used 
for inference. We sought to investigate whether there were 
fundamental reasons why we need to ground in some cases. 

An alternative to variable elimination is to used search- 
based methods based on conditioning such as recursive 
conditioning [Darwiche, 2001] and other methods [see e.g., 
Bacchus et al., 2009]. The advantage of these methods is 
that conditioning simplifies the representations, rather than 
complicating them. The use of lifted search-based infer- 
ence was proposed by Gogate and Domingos [2010], how- 
ever to be both correct and able to do inference without 
grounding requires more attention to detail than given in 
that paper. This paper answer different questions than Jha 
etal. [2010]. 

Note that this paper is about exact inference. Lifted al- 
gorithms based on belief propagation (e.g. by Singla and 
Domingos [2008] and Kersting et al. [2009]) explicitly ig- 
nore the interdependence amongst the instances that exact 
inference needs to take into account. 

In deriving an algorithm that never needs to ground, it is 
often the examples that demonstrate why simpler methods 
do not work that are most insightful. We have thus chosen 
to write this paper by presenting examples that exemplify 
the cases that need to be considered. 











(R(x,y)^ 


y 




(^ X 






(a) 



(b) 



(Ja^^ (J(VV) (J(An^ C^^nA); C <{WS(x,y),R(x,y)},ct,,> 



<{},{Q(x),R(x,y)},ct)> 



(c) 



Figure 1: (a) a parametrized graphical model, (b) its grounding and (c) its parfactor graph 



2 Background 

The problem of lifted inference arises in relational prob- 
abilistic models where there are probability distributions 
over random variables that represent relations which de- 
pend on individuals. Poole [2003] gives an example where 
the probability that a person fitting a description committed 
a crime depends on the population size, as this determines 
how many other people fit the description. We don't want 
to reason about the other individuals separately. Rather, we 
would like to reason about them as a block considering only 
the number of such individuals. 

A population is a set of individuals. A population cor- 
responds to a domain in logic. The population size is the 
cardinality of the population which can be a finite number.' 
For the examples below, where there is a single population, 
we write the population as Ai . . . A„, where n is the popula- 
tion size. 

A parameter, which corresponds to a logical variable, is 
written in lower case. Parameters are typed with a popu- 
lation; if X is a parameter of type T, pop{x) is the popula- 
tion associated withx and \x\ = |t| = \pop{x)\. We assume 
that the populations are disjoint (and so the types are mutu- 
ally exclusive). Constants are written starting with an upper 
case letter. 

A parametrized random variable (PRV) is of the form 
F{ti,...,tii) where F is a k-ary functor (a function sym- 
bol or a predicate) and each f,- is a parameter or a constant. 
Each functor has a range, which is {True, False} for predi- 
cate symbols. A parametrized random variable represents a 
set of random variables, one for each assignment of an in- 
dividual to a parameter The range of the functor becomes 
the domain of the random variables. 

A substitution is of the form {xi/ti, . . . jXi^/ti^} where x,- 
are distinct parameters and f,- are constants or parameters, 
such that X, and f, are of the same type. Given a PRV r 
and a substitution 9 = {xi/ti, . . . ,Xk/tk}, the application of 
on r, written rO is the PRV with each x, replaced by f,. 



Infinite population sizes turn out to be simpler cases as p" 
for any p < 1 . So the infinite case allows for more pruning. 



A substitution = {xi/ti, . . . ,xic/tii} grounds parameters 
x\ ...Xkif ti . . J/i are constants. A grounding substitution 
of r is a substitution that grounds all of the parameters of r. 

Probabilistic inference relies on knowing whether two ran- 
dom variables are the same. With parametrized random 
variables, we unify them to make them identical, but in- 
stead of just applying a substitution (as in regular theorem 
proving), Poole [2003] proposed to split parametrized ran- 
dom variables, forming the unifier and residual PRVs. 

Example 1 Applying substitution {x/A,y/B} to PRV 
Foo(x,y,z) results in PRV that is the direct application, 
Foo(A,B,z), and two "residual" PRVs, Foo{x,y,z) with the 
constraint x 7^ A and Foo{A,y,z) with the constraint y y^ B; 
these three parametrized random variables, with their asso- 
ciated constraints, represent the same set of random vari- 
ables as Foo{x,y,z)- 

A parametrized graphical model (Bayesian network or 
Markov network) is a network with parametrized random 
variables as nodes, and the instances of these share poten- 
tials. We need to be explicit about which instances share 
potentials. 

Example 2 The parametrized graphical model of Figure 
1 (a), is shown using parametrized random variables and 
plates (where the parameters correspond to plates). The 
plate representation represents n instances of Q{x) and n^ 
instances of R{x,y) in the grounding, shown in Figure 1 (b). 

We assume the input to our algorithm is in the form of par- 
factors. A parametric factor or parfactor [Poole, 2003] is 
a triple (C, V, (p) where C is a set of inequality constraints 
on parameters. Visa set of parametrized random variables 
and is a factor, which is a function from assignments of 
values to V to the non-negative reals, (p is used as the po- 
tential for all instances of the parfactor that are consistent 
with the constraints.^ Milch et al. [2008] also explicitly in- 
clude a set of parameters in their parfactors, but we do not. 

This is known as parameter sharing, but where the parame- 
ters are the parameters of the graphical model, not the individ- 
uals. Unfortunately, the logical and probabilistic literature often 
uses the same terminology for different things. Here we follow 



A parfactor means its grounding; the set of factors on V9 
(all with table 0) for each grounding substitution of V 
that obeys the constraints in C. 

2.1 Lifted Inference 

Lifted variable elimination, such as in C-FOVE [Milch 
et al., 2008], allows for inference to work at the lifted level 
(doing unification and splitting at runtime or as a prepro- 
cessing step) like normal variable elimination, until we re- 
move a PRV that contains the only instance of a free param- 
eter or is linked to a PRV with a different set of parameters. 
At this stage, we need to take into account that the PRV 
represents a set of random variables. 

Example 3 Suppose we have a factor on S{x,y), R{x,y) 
and Q{x), as in Figure 1 (a). We can sum out all instances 
of S{x,y), as all of the factors are the same, and get a new 
factor on R{x,y); this does n^ (identical) operations on the 
grounding in a single step. If we then remove R, we are 
multiplying a set of identical factors, and so can take their 
value to the power of the population size [Poole, 2003]. 

Suppose, instead, we were to first sum out Q{x). In the 
grounding, for each individual A,, the random variables 
R{Ai,Ai) ...R{Ai,A„) are interdependent and so eliminat- 
ing Q results in a factor on R{Ai,Ai) . . .R(Ai,A„). The size 
of this factor is exponential in n. 

De Salvo Braz et al. [2005] realized that the identity of the 
individuals is not important; only the number of individ- 
uals having each value of R. They introduced counting 
to solve cases such as removing Q{x) first in polynomial 
time rather than the exponential time (and space) used for 
the ground case. They, however, do the counting and sum- 
ming in one step, which limits its applicability. Milch et al. 
[2008] defined counting formulae that give a representation 
for the resulting lifted formula and can then be combined 
with other factors. This expanded the applicability of lifted 
inference, but it still requires grounding in some cases. 

2.2 Search-based probabilistic inference 

An alternative to lifting variable elimination is to lift a 
search-based method. The classic search-based algorithm 
is recursive conditioning [Darwiche, 2001], a version of 
which is presented in Algorithm 1 ? This algorithm is pre- 
sented in this non-traditional way, to emphasize the cases 
that need to be implemented for lifting. In particular, de- 



Algorithm 1: Recursive Conditioning: rc{Con,Fs) 

input: Con: a set of variable — value assignments 
Fs: set of factors 

output: a number representing Y.xWFeFsF^\Con) 
\i vars{Con) % vars{Fs) then {Case 0} 

return rc{{{x =V) G Con : x E vars{Fs)},Fs) 
if 3v such that {{Con,Fs),v) € cache then {Case 1} 

return v 
else if 3f € Fs : vars(f) C vars{Con) then {Case 2} 

Fq -s— {/ G Fi : vars(f) C vars{Con)} 

return (j\f^p^^eval(f,Con)) x rc{Con,Fs\Fo) 
else it factor graph {Con,Fs) is disconnected then {Case 
3} 

return I I^Qj^jjg^jgjj component cc''^\^^'^t^^) 
else {Case 4} 

select variable x E vars{Fs) \ vars{Con) 

sum <(— 

for each value v ofx do 

sum <— sum + rc{{x == v} U Con,Fs) 

cache -s— cache (J {{{Con,Fs) ,sum)} 

return sum 



the traditions as much as seems sensible, and apologize for any 
confusions. In particular "=" is used between a (parametrized) 
random variable and its value, whereas "7^" and "/" are used for 
parameters (logical variables). 

Typically recursive conditioning requires computing a de- 
composition tree (D-Tree). Here we follow more the approach of 
[Bacchus et al., 2009] where we dynamically examine the prob- 
lem for disconnected components as the search proceeds. 



coupling branching and the evaluation of factors is use- 
ful for developing its lifted counterpart. The correctness 
does not depend on the order of the cases (although effi- 
ciency does). In this algorithm Con is a context, a set of 
variable = value assignments, and Fs is a set of input fac- 
tors (this algorithm never creates or modifies factors; it only 
evaluate them when all of their variables are assigned). We 
separate the context from the factors; typically these are 
combined to give what could be called partially-assigned 
factors.^ 

In case 0, if there are variables that appear in Con that 
do not appear in Fs, these are removed from Con. This 
is called "forgetting" in the description below; we forget 
the context that is not relevant for the rest of the factors. 
vars{S) is the set of variables that appear in S. 

In case 1, the cache contains a set of previously computed 
values. If a value has already been computed it can be re- 
called. Initially the cache is {(({},{}), 1>}- 

For case 2, if all of the variables that appear in a factor/ 
are assigned in Con, eval{F, Con) returns the number that 
is the value of F for the assignment Con. These numbers 
are multiplied. 

For case 3, a factor graph^ on {Con,Fs) is a graph where 
the nodes are factors in Fs, and there is an arc between 

"^For the lifted case, projecting the context onto the factors 
loses information that is needed; see Footnote 7. It also makes 
it conceptually clearer that the factors share the same context. 

This is related to a factor graph of Frey [2003] but we don't 
explicitly model the variable nodes. 



factors that share a random variable that isn't assigned in 
Con. The connected components refer to the nodes that are 
connected in this graph. The connected components can be 
solved separately, and their return values multiplied. 

Case 4 branches on a variable x that isn't assigned. The 
efficiency, but not the correctness, of the algorithm depends 
on which variable x is selected to be branched on. 

To compute P{x\Obs), for each value v of x, call rc{{x — 
v} U Obs,Fs) where Fs is the set of all factors of the model, 
and normalize the results. 

An aspect that is important for lifted inference is that when 
values are assigned, the factors are simplified as they are 
now functions of fewer variables. This should be contrasted 
to variable elimination that constructs more complicated 
factors. 

3 Search-based Lifted Inference 

In this section, we develop a lifted search-based algorithm. 
We show its correctness with respect to a parallel ground 
algorithm that uses the same order for splitting. Note that, 
because the lifted algorithm removes multiple variables at 
once, this restricts the order the variables are split in the 
ground algorithm. A legal ordering for the lifted inference 
branches on all instances of a PRV at once, whereas the 
corresponding ground algorithm branches on all instances 
sequentially. We show how the complexity (as a function 
of the population size) of the lifted algorithm is reduced 
compared to the corresponding ground algorithm. Because 
we want our algorithm to be correct for all legal branch- 
ing orderings, we ignore the selection of which variable to 
branch on; this can be optimized for efficiency. 

We assume that we can count the number of solutions to a 
CSP with inequality constraints in time that is at most loga- 
rithmic in the domain size of the variables (e.g., by adapting 
the #VE algorithm of Dechter [2003] to not enumerate the 
undistinguished variables). 

3.1 Intermediate Representations 

The lifted analogy of a context in Algorithm 1 is a count- 
ing context which represents counts of assignments to 
parametrized random variables. 

A counting context on V is a pair {V,x), where V is a set 
of PRVs (all taking a single argument of the same type), all 
parametrized by the same parameter, and J is a table map- 
ping assignments of PRVs in V into non-negative integers. 
A counting context represents a context in the grounding. 
For each individual of the type, the table x specifies how 
many of the individuals take on that tuple of values. We 
can also treat a counting context in terms of j as a set of 
pairs of the assignment of values to V and the correspond- 
ing count. 
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Example 4 A counting context for V = {/?(x), r(x)}, and 



X 



represents the assignment of values to R and T for 90 (20 + 
40+10 + 20) individuals. 20 of these have both /?(...) and 
T{. . .) true, 40 have R{. . . ) true and T{. . .) false, etc. 

There is a separate counting context for each type. A cur- 
rent context is a set of pairs either of the form {Var, Value) 
where Var is a PRV that has no parameters, or of the form 
{t,CC) where T is a type and CC is a counting context 
where the parameter of each of the PRVs in CC is of type 
T. A current context can have at most one pair for each 
type. 

A PRV P is assigned in a current context Con if it has no 
parameters and {P, Val) £ Con, or if it is parametrized by 
a variable of type T and if (t, {V,x)) G CC and P unifies 
with an element of V. 

A parfactor graph on {Con,G) where Con is a current 
context and G is a set of parfactors, has the elements of G 
as nodes, and there is an arc between parfactors (Ci , Vi , 0i) 
and (C2, V2,02) if there is an element of Vi that isn't as- 
signed in Con that unifies with an element of V2 such that 
the unifier does not violate any of the constraints in either 
parfactor. 

The grounding of a parfactor graph on {Con,G) is a fac- 
tor graph on {Con',G'), where for every counting parfactor 
{S,F,9) in G, and for every grounding substitution y/ of 
all of the free parameters that does not violate S, Fxj/ is 
in G' with table 9. Con' represents all of the ground in- 
stances that are assigned in Con, with the corresponding 
counts given by the table in Con. 

The grounding of a parfactor graph defines its semantics. 
We carry out lifted operations so that the lifted operations 
have the same result as carrying out recursive conditioning 
on the grounding of the parfactor graph for the same elim- 
ination ordering. 

3.2 Symmetry and Exchangeability 

The reason we can do lifted inference is because of symme- 
tries. Having a symmetry between the unnamed individuals 
means that a derivation about some of the individuals can 
be equally applied to any of the other individuals. 

We say that a set of individuals are exchangeable in a par- 
factor graph if the grounding of the parfactor graph with 
one consistent assignment of individuals to variables is iso- 
morphic to the grounding of the graph with another assign- 
ment. Graph isomorphism means there is a 1-1 and onto 




Figure 2: A parfactor graph that is problematic for shatter- 
ing and its preemptively shattered counterpart 

mapping between the nodes where the factors are identical. 
Exchangeability means that reasoning with some of the in- 
dividuals can be applied to the other individuals. 

3.3 Unification, spUtting and shattering 

In order to determine which instances of parametrized ran- 
dom variables are the same random variables, Poole [2003] 
used unification and splitting on logical variables, which 
guarantees that the instances are identical or are disjoint. 
De Salvo Braz et al. [2005] proposed to do all of the split- 
ting up-front in an operation called shattering (see Kisynski 
and Poole [2009] for analysis of splitting, shattering and re- 
lated operations). 

Shattering is a local operation and does not imply graphs 
constructed by substitutions are isomorphic as in the fol- 
lowing example: 

Example 5 Consider the network of Figure 2 (a). Al- 
though it is shattered, the instances with x = y have a differ- 
ent grounding from the instances where x^y. This graph 
can be split, on x = y in the right-hand parfactors, giving 
the network shown in Figure 2 (b). 

An alternative to the local shattering is to carry out a more 
global preemptive shattering. A set of parfactors preemp- 
tive shattered if 

• for every type, and every constant C of the type that 
is explicitly mentioned, every parfactor that contains 
a variable x of the type includes the constraint x^C. 

• if variables x and y of the same type are in a parfactor, 
the parfactor contains the constraint x ^ y. 

Given a set of parfactors, to construct an equivalent set of 



preemptively shattered set of parfactors, all logical vari- 
ables in a parfactor are split with respect to all explicitly 
given constants, and any pairs of logical variables in a par- 
factor are split with respect to each other. 

Preemptive shattering gives more splits than shattering, and 
sometimes more than needed, but it allows our proofs to 
work and does not prevent the asymptotic complexity re- 
sults we seek. With preemptive shattering, counting the 
number of instances represented by a parfactor is straight- 
forward; there are no complex interactions. For the rest of 
this paper, we assume that all parfactors are preemptively 
shattered. 

Note that, as can be seen in the parfactor graph of Figure 
2 (b), even after preemptive shattering, we cannot always 
globally rename variables so that the unifying variables are 
identical. 

3.4 Disconnected Grounding 

When the graph is disconnected. Algorithm 1 considers the 
connected components separately, and multiplies them. In 
this section, we cover all of the cases where the grounding 
is disconnected, and show how it corresponds to operations 
in the lifted case. 

If the lifted network is disconnected, the ground counter- 
part is disconnected, and so these disconnected components 
can be solved separately and multiplied. 

If the lifted network is connected, this does not imply that 
its grounding is connected. For example, the parfactor 
graph of Figure 1 (c) is connected yet its grounding is not 
connected. 

Intuitively, if there is a logical variable that is in all of 
the counting parfactors, the instances for one individual 
are disconnected from the instances for another individual. 
Thus, we can the solve the problem for one of the individ- 
uals, and the value for the lifted case is that value to the 
power of the number of individuals. 

This intuition needs to be refined because logical variables 
are local to a parfactor; renaming the variables gives ex- 
actly the same grounding. There are cases where chains of 
unifications cause connectedness: 

Example 6 The parfactor graph 



Caxty},{S(x),R(x,y),Q(x,y)},(()>^ 



3{xtz},{S(x),R(z,x),S(x,z},(()^>: 



does not have disconnected ground instances, even though 
X is in every PRV. For three individuals, Ci,C2,C3, in 
the grounding R{C\ , C3) is connected to R(C2, C3) through 
^(Cs) in the grounding of the bottom parfactor, thus S{Ci) 



is connected to S{C2) for any different Ci and C2, using the 
top parf actor 

This reasoning can be apphed generally: 

Suppose X is a logical variable in parfactor (C, V, (p) that ap- 
pears in parfactor graph G. connected{x, (C, V,(p), Con, G) 
means the instances of x in parfactor {C,V,^) are con- 
nected to each other in the grounding of {Con,G). 
connected can be defined recursively as follows. 

connected{x, (C, V,(p), Con,G) is true if and only if: 

• X appears in V, is not assigned in Con and there 
is a PRV in V that is not assigned in Con and not 
parametrized by x or 

• there is a parfactor (C', V' ,(p') in G, such that an ele- 
ment of V unifies with an element of V' (in a manner 
consistent with C and C', and that are not assigned 
in Con) and x is unified with a variable x' such that 
connected{x' , (C', V' , ^'),Con,G) is true. 

The definition of connected is sound: if 

connected{x,{C,V,(j)),Con,G) is true, the instances 
of X are connected in the grounding. The proof for the 
soundness is a straightforward induction proof; essentially 
the algorithm is a constructive proof. 

However, the definition is not complete: there can be in- 
stances that are connected even though connected is false. 
It is instructive to see what a proof for completeness would 
look like. To prove completeness, we would prove that all 
instances of x are disconnected if the above construction 
fails to derive they are connected. Suppose there are two 
constants C and C', we need to show that the graph with 
X replaced by C is disconnected with the graph with x re- 
placed by C'. The graph with x replaced by C has C in 
every PRV (by construction) and the graph with x replaced 
by C' has C' in every PRV. However, this does not imply 
that the graphs are disconnected as there could be a PRV 
that contains both C and C', as in the following example: 

Example 7 Consider the parfactor graph: 



<{xty},{S(x,y),R(x,y)},(t),> 



<{wtz},{S(w,z),R(z,w)},(() > 



In the grounding, for all individuals C, 7^ C,, the random 
variable S{Ci,Cj) is connected to S{Cj,Ci). However, it is 
disconnected from other instances of S{x,y). 

We use the definition of connected to detect potentially 
disconnected components, and we can explicitly check for 
which instances are connected. In this way, we can ensure 
that the lifted algorithm detects disconnectedness whenever 



the ground algorithm would. The only counterexamples to 
the completeness of connected are when there is a set of 
variables, all with the same domain, and all of them ap- 
pear in all PRVs in the parfactor graph (possibly renamed), 
and there is an inequality constraint between them. Sup- 
pose there are k such variables, xi ...,X|(., in a parfactor 
We choose k constants^, Ci , . . . , Q, and apply the substitu- 
tion {xi IC\ ,.. . ,Xk/Ck} to that parfactor, and then proceed 
to ground out all the corresponding variables in the other 
factors by unifying with the factors in all ways, forming a 
generic connected component. We then need to count the 
number of copies of each PRV instance in the connected 
component; suppose this is c. For a population size of 
n, there are n{n — I) . . . {n — k + I) instances of the PRV, 
and there are c elements in each connected component, 
therefore there are n{n — I) . . . {n — k + I) / c disconnected 
components. If we compute p as the probability of the 
generic connected component, we need to take p to the 
power n\/{{n — k)\ x c) to compute the probability of the 
lifted network. 

In the above analysis, n is the population size and k and c 
depend only on the structure of the graph, and not on the 
population size. As we assume that we can count the pop- 
ulation size in time logarithmic in the population size, the 
above procedure is polynomial in the log of the popula- 
tion size, whereas grounding is polynomial in the popula- 
tion size. As we see below, this is the only case where the 
grounding is polynomial in the population size. 

In Example 7, A: = 2 and c = 2, and so the power is n{n — 
l)/2. In an example with A: = 3, it is possible that c could 
be 1,2,3 or 6. 

Algorithm 2 gives the lifted variant of case 3 of Algorithm 
1. The main loop is the same as Algorithm 1, with the 
recursive call lrc{Con,Fs) where Con is a current context 
and Fs is a set of counting parfactors. 

3.5 Counting 

Once we have a single connected component (and there is 
no logical variable for which its instances are connected), 
we select a PRV to branch on. In describing this, as in Al- 
gorithm 1, we decouple branching on a variable, and eval- 
uating parfactors. Typically a PRV is associated with many 
parfactors, and when we branch on a PRV we need to count 
the number of instances with various values for the PRV. 
We need to make sure that we branch in a way that enables 
us to evaluate the relavant parfactors. 

For the simplest case, assume we want to sum out a 
Boolean PRV F(x) that has one free logical variable, x, with 



^These should be constants that don't otherwise appear in the 
current set of counting parfactors. We need to choose the con- 
stants so that the same constants are used in different branches, to 
ensure that caching finds identical instances whenever the ground 
instances are found in the cache. 



Algorithm 2: Lifted search, case 3: grounding is discon- 

nected. 

it Fs is disconnected then {Case 3a} 

return [ Iconnected component cc "C(Lon, cc) 
else if 3x -^connected{x,F , Con, Fs) for F G Fs then 
{Case 3b} 

Select one x such that -^connected{x,F , Con,Fs) 

let T be the type of x 

let C = {x of type T : 

—•connectedlxjF, Con,Fs)} for some F (z Fs 

l&tk=\C\ 

replace xi , . . . ,X|t with Ci , . . . , Q in F 

unify F with all other factors in Fs 

let c = [{instances of F in Fs}\ 

let e = n\/{{n — k)] x c) 

return lrc{Con,FsY 



domain {Ai, . . . ,A„}. The idea behind counting [de Salvo 
Braz et al., 2007] is that only the number of exchangeable 
individuals that have a PRV having a particular value mat- 
ters, not their identity. We present counting first consider- 
ing this simple case, then more complex cases. 

In counting branching, for each /, such that <i <n, the 
algorithm generate the branch where there are / instances of 
F true, and so n — / instances of F false. This branch rep- 
resents (") paths in the grounding, as there are this many 
renamings of constants that would result in the same as- 
signment. Thus it can multiply the result of evaluating this 
branch by (") . Note that the counting branching involves 
generating n+l branches, whereas in the grounding there 
are 2" assignments of values after the equivalent ground 
branching. The resulting counting context records the num- 
ber of instances of F that are true and number that are false. 

We now show how to evaluate various cases of parfactors 
that can include F. The general case is a combination of 
these specific cases. For all of the example below, assume 
they are part of a larger parametrized graphical model. In 
particular, assume that the instances are connected, so that 
the case described in the previous section does not apply. 

Example 8 Consider the parfactor 

({},{F(x),£},(/)i) 

Suppose \x\ = n. This parfactor represents n factors. Sup- 
pose 01 is: 



F{x) 


E 


h 


True 


True 


«! 


True 


False 


"2 


False 


True 


«3 


False 


False 


a4 



Suppose we have split on E and assigned it the value 
True,and then we split on F{x) and are in the branch with 



F = True for i cases and F = False for n — i cases. This is 
represented by the current context: 



true, 



F{xl_ 

True 

False 



^i 



The contribution of this parfactor in this current context is 



a\a 



l^'S 



Example 9 Consider the parfactor 

{{},{F{x),G{y)},<h) 

Suppose X and y are of different types, where \x\ 

m. (p2 is: 



bl 



F{x) 


G{y) 


02 


True 


True 


ai 


True 


False 


«2 


False 


True 


«3 


False 


False 


a4 



This parfactor represents nm factors; for each combination 
of assignments of values to the instances of F and G, there 
is a factor. In a current context with / F's true and h G's 
true, this parfactor has a contribution: 

ihiim-h) {n-i)h {n-i){m-h) 



a{ a. 



a 



a: 



Example 10 Consider the parfactor 

({},{F(x),G(x)},</.2) 

Suppose |x| = n. This parfactor represents n factors. Un- 
like the previous cases, counting branching is not adequate; 
we need to consider which /^-assignments go with which 
G-assignments. We can do a counting branch on F first: 
for each / G [0,n], consider the case where F is true for / 
individuals, and is false for n — i individuals. This case rep- 
resents (") branches in the grounding. To split on G we can 
do a dependent branch: consider the / individuals for which 
F is true, and the n — i individuals for which F is false sepa- 
rately. For each j e [0, /], we consider the branch where G is 
true fory individuals and false for i—j individuals all with 
F = True; this branch corresponds to (') ground branches. 
For each k E [0,n — /] we construct the branch where G is 
true for k individuals and is false for i—j — k individuals 
with F = False. This represents ("^') ground cases. This 
branch is represented by the counting context: 



F{x) 


G{x) 


<h 




True 


True 


J 




True 


False 


i-j 




False 


True 


k 




False 


False 


n — i - 


-k 



The contribution of the parfactor in this branch is: 



Example 11 Consider the parfactor 

{{x^y},{F{x),G{y)},^2) 

Suppose X and y are of the same type, where |x| ~ \y\ — 
n. This parfactor represents n{n — 1) factors. This can be 
solved by a mix of the previous two examples. If we were 
to do the same as Example 9, we would also include the 
cases where x — y, which are explicitly excluded; but these 
are the cases in Example 10. So the contribution of this 
factor can be computed by dividing the result of Example 
9 by the result of Example 10, or equivalently subtracting 
the exponents. As in Example 10, we consider the case 
where F is true for / individuals, and for these individuals 
G is true for j of them, and out of the individuals where 
F is false, G is true for k of them. Taking the difference 
between the exponents Example 9 and 10, and noticing that 
h in Example 9 corresponds Xo j + k in Example 10, the 
contribution of these factors is: 

i{j+k)-i i(n-i-k)-i+j (n-i){j+k)-k (n-i){n-i-k)-n+i+k 

Example 12 Consider a mix between the previous exam- 
ples. Suppose we have the parfactors: 

({...},{F(x),G(x),...}) 

({x^y,...},{//W,G(x),...}) 

where all of the variables are of the same type with n in- 
dividuals. Suppose the branching order is to branch on H, 
then F, then G. The split on G needs to depend on both 
H and F. This can be done if the splits on H and F are 
dependent; that is, we do a separate count on F for the in- 
dividuals for which H are true and the individuals for which 
H are false. Then we do a separate count^ on G for the set 
of individuals for each combination of values to H and F. 

Counting branching needs to be expanded to cascaded 
counting branching. Dependent counting branching on a 

PRV X that is parametrized by a parameter of type T, works 
as follows. First, we find the corresponding counting con- 
text (y,x) for T. Dependent Counting branching replaces 
this with a counting context on (V U {X}^x') ^s follows. 
For each assignment (f , /) in the table X (i is the count for 
assignment t), for each j in [0, /], we create the table x' that 
maps tU{X — true} to / and tU {X = false} to /— y. This 
assignment corresponds to (') , different grounding assign- 
ments, so the grounding needs to be multiplied by (') . This 
is recursively carried out for each tuple. 

Example 13 Starting from the current context of Exam- 
ple 8, dependent counting branching on G(x) produces the 
counting context of Example 10. This context corresponds 

^Note that if we had projected the counts onto the separate 
factors, we would have lost the interdependence between F and 
H, which is needed as the count for G depends on both. 



to ('.) ("^ ') contexts in the grounding. Note that there are 
i(n — i) leaves that are decedents of the current context cre- 
ated in Example 8, whereas in the grounding there are 2" 
leaves that are descendants of each corresponding ground 
context. 

Branching is shown as Case 4 in Algorithm 3. In this al- 
gorithm Con is a current context and Fs is a set of input 
parfactors. Case 4a is the same as case 4 in Algorithm 1 
(but for Boolean variables). Case 4b is for branching on a 
PRV with a single parameter, and sets up dependent count- 
ing branching that is presented in Algorithm 4. Note that 
this treats a counting context as a set of pairs of an assign- 
ment of values to a set of PRVs and a count (as described 
in Section 3.1). 

The branching factor depends on the population, but the 
depth of the recursive calls depends on the structure of the 
counting context, and not on the population size. The depth 
of the recursive calls provides the power of the polyno- 
mial. If we use a sparse representation of current contexts 
with zeros suppressed, this is never worse than grounding. 
[However, whether we use a sparse or dense representation 
is something that can be optimized for.] 

Algorithm 3: Lifted Recursive Conditioning: lrc{Con,Fs) 

input: Con: current context 

Fs: set of parfactors 

output: a number representing 'LjI\FegmundmgiFs.,Con)F{x) 

if 3x G vars{Con) \ vars{Fs) then {Case 0} 

return Ire iY.xCon,Fs) 
if 3v such that {{Con,Fs) ,v) G cache then {Case 1} 

return v 
else if 3f £ Fs : vars(f) C vars{Con) then {Case 2} 

return evaLparfactor(f,Con)xlrc{Con,Fs\{f}) 
else {Case 3} 

See Algorithm 2 
else {Case 4} 

select PRV X E vars{Fs) \ vars{Con) 
ax contains no parameters then {Case 4a} 
sum <— lrc{{X — true} U Con,Fs) + lrc({X = 
false} U Con,Fs) 
else {Case 4b} 

suppose the parameter of X is of type T 
if 3x : (t,x) e Con then 

sum ^ branch{x,X,{},Con\{{T,x)}jP's) 
else 

sum -s— branch({{{) , \x\)},X, {},Con,Fs) 
cache ^ cache U { ( ( Con ,Fs),sum)} 
return sum 



The main remaining part of the lifted algorithm is to eval- 
uate a parfactor {C,V,(p) in a counting context (V',x), 
where all of the variables in V are assigned in V'. There 
are three cases: shared parameters, different parameters of 



Algorithm 4: Dependent Counting Branching: 

branch{x,X,x' , Con,Fs) 

input: X'- ^ set of tuples from a counting context 

X: the PRV to branch on its instances 

X''- the new counting context being constructed 

Con: the current context to be added to 

Fs: the set of all factors 

ifX = {}then 

Suppose T is the type of the parameter in V 

return lrc{ConU{{T,x')},Fs) 
else 

select (f , /) e X 

sum f- 

forys [O,i]do 
let x" be 

X' U (f U {X = true} J) U (f U {X ^ false}, i -j) 
sum ^ sum+ ['jbranch{x\{{t,i)},x" tCoHjFs) 

return sum 



the same type and parameters of different types. One par- 
factor can contain all of these. 

For shared parameters, as in Example 10, the parf actor pro- 
vides the base, and there is a unique counting context that 
provides the powers. First we group all of these together 
and raise them to the appropriate powers, and then treat 
them as a block. 

For parameters of different types, as in Example 9, we need 
to multiply the powers. We can treat the shared parameters 
as a block. 

For different parameters of the same type, as in Example 
11, we can use the other two cases: first we treat them as 
different types (which over-counts because it includes the 
equality cases), and then divide by the case when they are 
equal. We also have to readjust for double counting, which 
can be done using the coefficient of nl / {n — k)l where k 
is the number of such cases. For example, when k — 3, 
this is n{n — l){n — 2) = rr' — 3n^ +2n. The first of these 
corresponds to all parameters being different, the second to 
all pairs of parameters equal, and the third to all parameter 
the same. 

Algorithm 5 shows how to evaluate a parfactor in a current 
context. It omits the last case, as it is computed from the 
other two cases. 

Example 12 (cont.) Consider the branch where H is true 
for / individuals and false for n — i individuals. Suppose we 
then branch on F. We then consider the branch with F true 
for jo of the cases where H is false andy'i cases where H 
is true. We thus have: ji individuals for which F and H 
are true; / — y i individuals which have H true and F false; 
jo individuals that have H false and F true; and n — i —jo 



Algorithm 5: Evaluating a parfactor in current context: 

eval^arfactor{PF , Con) 

input: PF: a parfactor 
Con: a current context 
Suppose PF is {C,V,(I)) 
Suppose Con is (V',x) 
prod <— 1 
foreach {t,p) e do 

if t is consistent with variable assignments in Con 
then 

power <s— 1 

Let T be the type of X 

Select (T,(y",x)) eCon 

RedundantVars ^ vars{V") \ varsiV) 

powers power X L{t,i)e%:con.nstent{t,V) i 

prod <— prod x pP"^'''' 
return prod 



individuals what have both H and F false. We can then 
branch on G, for each of the four sets of individuals. We 
thus know the counts of each case; Algorithm 5 computes 
the contribution of each factor. 

The final two cases of the algorithm are caching (case 1 
of Algorithm 1) and forgetting (case of Algorithm 1). 
Caching can remain the same, we just have to ensure that 
the cache can find elements that are the same up to renam- 
ing of variables, which can be done easily as the current 
context does not depend on the name and the variable and 
can be stored in a canonical way (e.g., alphabetically). For- 
getting is the inverse of splitting. A variable in a counting 
context that doesn't appear in the parfactors can be summed 
out of the counting context (which is the same operation as 
summing out a variable in variable elimination). Y,x Con in 
Algorithm 3 means to sum out X from the counting con- 
text it appears in or to remove it if it is not a parametrized 
variable. 

This description assumed binary-valued variables, and only 
functors with or 1 arguments. The first of these is straight- 
forward to generalize, and the second is not. 

Consider what happens when F can have more than two 
values. Suppose F is a unary m- valued PRV with range 
{vi , . . . , v„,}. That is, F(A,) is a random variable with do- 
main {vi,. . . ,v,„}. The assignments we need to consider 
are when there are non-negative integers /i . . . ;„, where /„, 
represents the number of individuals that have value /. Thus 
for each assignment to ii . .. /„,, where /,■ > for each j and 
i'l H h im = n, we consider the assignment 

F{Ai) ^ VI for < / < /i 
F{Ai) = V2 for i'l < / < i'l + !2 

F{Ai) = Vm for ii+i2-\ h im-i <i<n 



It is a straightforward combinatorial exercise to include this 
in the algorithm (but complicates the description). 

4 Conclusion 

Lifted probabilistic reasoning has proved to be challenging. 
There have been many proposals to lift various algorithms, 
however all of the exact algorithms needed to ground out 
a population in some cases (and it is often difficult to tell 
for which cases they need to ground a population). We set 
out to determine if there was some fundamental reason why 
we would need to ground out the representation, or whether 
there was some case where we needed to effectively ground 
out. We believe that we have answered this for two cases: 

• When lifted inference is polynomial in a population, 
which occurs when VE does not create a factor that 
is parametrized by a population or search can be dis- 
connected for a population, we can solve it in time 
polynomial in the logarithm of the population. 

• For parametrized random variables with zero or a sin- 
gle argument, and search-based inference (and so also 
variable elimination, due to their equivalent complex- 
ity) is exponential when grounding, we answer arbi- 
trary conditional queries in time polynomial in the 
population. 

The question of whether we can always do lifted inference 
in polynomial time in each population size when there are 
PRVs with more than one argument, is still an open prob- 
lem. While we can use the algorithm in this paper for many 
of these cases, there are some very tricky cases. Hopefully 
the results in this paper will provide tools to fully solve this 
problem. 

We have chosen to not give empirical comparisons of our 
results. These are much more comparisons of the low-level 
engineering than of the lifted algorithm. There are no pub- 
lished algorithms that can correctly solve all of the exam- 
ples in this paper in a fully lifted form. 
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